Keyword Extraction and Semantic Tag Prediction

نویسندگان

  • James Hong
  • Michael Fang
چکیده

Content on the web is often organized through user generated tags for intuitive search and retrieval. Such tags convey meta-information about the subject matter of the texts they represent. For this project, we applied machine learning (Bayesian co-occurrence, k-NN, SVM, NNS) to predict tags of StackExchange posts obtained from Kaggle: “Facebook Recruiting Keyword Extraction III.” Using our non-parametric, fuzzy Nearest Neighbor Search algorithm, we achieved a F1-score of 0.471 on a testing set with unseen data and 0.773 on the Kaggle test set (containing many duplicate data points). Furthermore, when predicting a single tag per post, our algorithm attained approximately 71.1% accuracy (on unseen data), surpassing the 0.65 accuracy attained by Stanley & Byrne (2013). Our keyword-tag co-occurrence model and fuzzy NNS proved to be fast and practical for large-scale subject and tag prediction problems with tens-of-thousands of tags and training documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some applications of a statistical tagger for Swedish

We will brie y describe a part-of-speech (POS) tagger for Swedish and discuss some applications: rule-based and probabilistic grammar checking, word prediction and keyword extraction. In POS tagging of a text, each word and punctuation mark in the text is assigned a morphosyntactic tag. We have designed and implemented a tagger based on a second order Hidden Markov Model [1]. Given a sequence o...

متن کامل

Optimizing title and Meta tags based on distribution of keywords; Lexical and semantic approaches

Problem statement: To increase traffic on websites, Search Engine Optimization (SEO) has provided many costly and time-consuming options. One problem is the inadequate distribution of keywords especially those keywords that users use the title tag and Meta tags. Approach: This study described work on an initial model for handling some of the SEO factors to increase the distribution of keywords....

متن کامل

MIKE: An Interactive Microblogging Keyword Extractor using Contextual Semantic Smoothing

Social media, such as tweets on Twitter and Short Message Service (SMS) messages on cellular networks, are short-length textual documents (short texts or microblog posts) exchanged among users on the Web and/or their mobile devices. Automatic keyword extraction from short texts can be applied in online applications such as tag recommendation and contextual advertising. In this paper we present ...

متن کامل

Exploring the Value of Folksonomies for Creating Semantic Metadata

Finding good keywords to describe resources is an on-going problem. Typically, we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well-populated source of unstructured tags describing Web resources. This article explores the value of the folksonomy tags as a potential source of keyword meta...

متن کامل

Finding User Semantics on the Web using Word Co-occurrence Information

With the currently growing interest in the Semantic Web, describing user semantics to model users and their social relationships is coming to play an important role. This paper proposes a novel keyword extraction method to extract user semantics from the Web. Based on co-occurrence information of words, the proposed method extracts relevant keywords depending on the context of a person. Our eva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013